API Enrichment Service Refactoring Design
Date: 2025-11-03
Last Updated: 2025-11-09
Status: Approved - Ready for Implementation
Design Session: Phase 2 Code Review - Critical File Analysis
Target File: backend/epgoat/services/api_enrichment.py
Executive Summary
Refactor the 2068-line backend/epgoat/services/api_enrichment.py God class into 14 focused modules using Chain of Responsibility and Observer patterns. The current implementation violates Single Responsibility Principle with 10+ responsibilities and an 800-line function. The refactoring splits the monolith into 3 clear layers: Pipeline Orchestration, Handler Chain, and Support Services.
Key Metrics: - Current: 1 file, 2068 lines, 10+ responsibilities, 800-line function - Target: 14 files, ~1900 lines total, single responsibility per file, <300 lines per file - Benefits: 16x reduction in max function size, 100% SOLID compliance, isolated testability
Context
Current Problem
The EPGEnricher class (backend/epgoat/services/api_enrichment.py) is a God class with critical violations:
SOLID Violations:
- Single Responsibility: Handles 10+ responsibilities (API enrichment, team parsing, cache management, regex matching, league inference, sport detection, FloSports mapping, time extraction, cost tracking, statistics learning)
- Open/Closed: Modification required to add new cache layers or enrichment strategies
- Function Length: enrich_event() is 800 lines (16x over 50-line standard)
Type Safety Issues:
- Missing type hints on __init__ parameters
- Bare dict instead of dict[str, Any]
Error Handling Issues:
- Overly broad except Exception catches (3 violations)
- Should use specific exception types
Complexity Issues: - Deep nesting (6+ levels) - Cyclomatic complexity >20 - Multiple loops over candidate leagues
Current Architecture
EPGEnricher (2068 lines)
ββ API enrichment orchestration
ββ Team parsing (parse_teams_from_payload, _clean_team_name)
ββ League inference (multi-strategy)
ββ Sport type detection (guess_sport_type_from_channel)
ββ 4-layer caching (Enhanced, Details, CrossProvider, Local DB)
ββ Regex matching (integration)
ββ FloSports mapping (extract_flosports_subcategory, map_flosports_to_league)
ββ Time extraction (get_event_times, _is_time_tba)
ββ Cost tracking (integration)
ββ Statistics learning (family β league patterns)
Design Decisions
Decision 1: Breaking Changes Allowed
Context: Need to refactor 2068-line God class violating SOLID principles.
Decision: Allow breaking changes to public API for clean architecture.
Rationale: - Current API is poorly designed (too many constructor parameters) - Breaking changes enable true Single Responsibility - Clean dependency injection requires new interface - Backward compatibility would force compromises
Consequences: - β Maximum flexibility for SOLID refactoring - β Clean architecture patterns possible - β Isolated component testing - β οΈ Callers must update to new API (documented migration path provided)
Decision 2: Chain of Responsibility for Enrichment Pipeline
Context: Sequential enrichment strategy with fallback mechanisms (cache β regex β DB β API).
Decision: Use Chain of Responsibility pattern with 7 handlers.
Rationale: - Natural fit for sequential fallback logic - Each handler is independent and testable - Easy to add/remove/reorder strategies - Clear separation of concerns - Handlers can fail gracefully without breaking chain
Alternatives Considered: 1. Strategy Pattern with Coordinator: Would create complex coordinator logic 2. Layered Service Architecture: Tighter coupling between layers
Consequences: - β Easy to add new enrichment strategies - β Easy to reorder priority - β Each handler <100 lines, highly focused - β Isolated testing per handler - β οΈ More classes to manage (7 handlers vs 1 monolith)
Decision 3: Separate Service Classes for Support Logic
Context: Multiple distinct responsibilities (team parsing, league inference, sport detection, etc.).
Decision: Create 6 focused service classes, each <300 lines.
Classes:
1. TeamParser: Team extraction and cleaning
2. LeagueInferencer: Multi-strategy league inference
3. SportTypeDetector: Sport detection and emoji mapping
4. FloSportsMapper: FloSports subcategory mapping
5. TimeExtractor: Event time parsing and TBA detection
6. EventEnrichmentBuilder: Enrichment dictionary construction
Rationale: - True Single Responsibility Principle - Each service highly testable in isolation - Clear interfaces and dependencies - Easy to mock for testing - Reusable across different enrichment strategies
Consequences: - β 100% SRP compliance - β <300 lines per service - β Isolated unit testing - β Clear dependency graph - β οΈ More files to navigate (6 services vs inline methods)
Decision 4: Rich Context Object (Mutable)
Context: Need to pass data through handler chain, accumulating parsed information.
Decision: Use mutable EnrichmentContext dataclass that flows through chain.
Rationale: - Pragmatic Python approach (vs pure functional) - Clear visibility into pipeline progression - Easy to debug (inspect context at any stage) - Accumulates parsed data from services - Single object to pass around
Alternatives Considered: 1. Immutable Request/Response: Creates object overhead on each handler 2. Separate Request + Accumulator: Two objects to pass around
Consequences: - β All data in one place - β Easy to debug - β Clear progression through pipeline - β οΈ Mutable state (acceptable tradeoff in Python)
Decision 5: Observer Pattern for Analytics
Context: Cross-cutting concerns (MatchDebugLogger, CostTracker, FamilyStatsTracker) need to observe enrichment without coupling.
Decision: Use Observer pattern with pluggable observers.
Rationale: - True separation of concerns - Easy to enable/disable analytics - No coupling between handlers and analytics - Observers can be added without changing handlers - Single responsibility maintained
Alternatives Considered: 1. Inject Into Each Handler: Bloated handler signatures, tight coupling 2. Context-Based Tracking: Mixes concerns in context object
Consequences: - β Zero coupling between handlers and analytics - β Easy to add/remove observers - β Clean handler interfaces - β οΈ Slightly more complex setup (factory handles this)
Decision 6: Each Cache Layer = Separate Handler
Context: 4 caching layers with different strategies and speeds.
Decision: Each cache layer gets its own handler in the chain.
Handler Order:
1. EnhancedMatchCacheHandler (24h channel cache - fastest)
2. EventDetailsCacheHandler (team/date/time lookup)
3. LocalDatabaseHandler (bulk events Β±3 days)
4. RegexMatcherHandler (pattern matching)
5. CrossProviderCacheHandler (shared across providers)
6. APIHandler (live API calls - slowest)
7. FallbackHandler (always succeeds)
Rationale: - Each cache is independent handler - Easy to add/remove/reorder cache layers - Each handler testable in isolation - Clear cache hierarchy - Natural fit for Chain of Responsibility
Consequences: - β Flexible cache layer configuration - β Independent testing per cache layer - β Easy to measure cache hit rates per layer - β οΈ More handlers (7 vs inline checks)
Decision 7: Factory Function for Dependency Injection
Context: Complex dependency graph with 10+ dependencies.
Decision: Use factory function (create_enrichment_pipeline()) as composition root.
Rationale: - Single place for all dependency wiring - Optional dependencies (can omit caches, API clients) - Clear dependency graph - Testable (can inject mocks) - Standard pattern (composition root)
Consequences: - β Clear dependency management - β Flexible configuration - β Easy to test (inject mocks) - β οΈ One more file (factory.py)
Architecture Overview
High-Level Structure
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EnrichmentPipeline β
β (Orchestration: prepares context, runs chain, notifies) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββ ββββββββββββ ββββββββββββ
β Servicesβ β Handlers β βObservers β
β (Layer 3) β (Layer 2)β β (Cross- β
βββββββββββ ββββββββββββ β cutting) β
ββββββββββββ
Layer 1: Pipeline Orchestration
EnrichmentPipeline (pipeline.py)
- Coordinates handler chain
- Prepares context with parsed data (using services)
- Notifies observers at key points
- Post-processes successful matches (update caches, learn patterns)
- Size: ~150 lines
EPGEnricher (Facade) - Optional for gradual migration - Backward-compatible wrapper - Delegates to EnrichmentPipeline - Size: ~50 lines
Layer 2: Handler Chain (7 Handlers)
Each handler implements EnrichmentHandler ABC with _try_enrich(context) method.
Handler Priority Order (fastest β slowest):
- EnhancedMatchCacheHandler (~50 lines)
- 24-48h TTL, channel-level cache
- 95%+ hit rate for same-day reprocessing
-
Dependencies:
EnhancedMatchCache,EventDetailsCache,EventEnrichmentBuilder -
EventDetailsCacheHandler (~60 lines)
- Team/date/time lookup
- Cross-provider event detail storage
-
Dependencies:
EventDetailsCache,EventEnrichmentBuilder -
LocalDatabaseHandler (~80 lines)
- Bulk prefetched events
- Β±3 day search window, 70% similarity threshold
-
Dependencies:
EventDatabase,EventDetailsCache,EventEnrichmentBuilder -
RegexMatcherHandler (~70 lines)
- Pattern-based matching
- High confidence (90%+) skips API calls
-
Dependencies:
MultiStageRegexMatcher,EventEnrichmentBuilder -
CrossProviderCacheHandler (~60 lines)
- Shared cache across providers
- Singleton pattern for reuse
-
Dependencies:
CrossProviderEventCache,EventEnrichmentBuilder -
APIHandler (~150 lines)
- Live API calls (TheSportsDB + ESPN fallback)
- Tries each candidate league in order
-
Dependencies:
TheSportsDBClient,ESPNAPIClient,EventEnrichmentBuilder -
FallbackHandler (~30 lines)
- Builds unmatched enrichment
- Always succeeds (end of chain)
- No dependencies
Handler Base Class (handlers/base.py, ~50 lines):
class EnrichmentHandler(ABC):
def handle(self, context: EnrichmentContext) -> EnrichmentContext:
# Error handling + chain logic
@abstractmethod
def _try_enrich(self, context: EnrichmentContext) -> EnrichmentContext:
# Handler-specific logic
Layer 3: Support Services (6 Services)
Each service is a focused class with single responsibility:
- TeamParser (
backend/epgoat/services/team_parser.py, ~200 lines) parse_teams(payload): Extract teams using various separatorsclean_team_name(team): Remove ranks, times, noisecanonicalize_teams(team1, team2): Resolve aliases to canonical names-
Dependencies:
TeamAliasIndex(optional) -
LeagueInferencer (
backend/epgoat/services/league_inferencer.py, ~250 lines) infer_league(channel_name, payload, family, teams, team_ids): Multi-strategy inferenceextract_league_token(text): Explicit token detectioninfer_from_team_ids(team_ids): Prefix-based inferenceinfer_from_team_database(teams): Database lookup-
Dependencies: None
-
SportTypeDetector (
backend/epgoat/services/sport_detector.py, ~100 lines) detect_sport_type(channel_name, payload): Sport detection from keywordsget_sport_emoji(sport_type): Map sport to emojiparse_sport_from_title(title): Extract sport prefix (e.g., "Ice Hockey (W)")-
Dependencies: None
-
FloSportsMapper (
backend/epgoat/services/flo_mapper.py, ~80 lines) extract_subcategory(payload): Extract FloSports subcategory (e.g., "flohockey")map_to_league(subcategory): Map to actual league (e.g., "flohockey" β "NHL")-
Dependencies: None
-
TimeExtractor (
backend/epgoat/services/time_extractor.py, ~120 lines) parse_event_time(event, parsed_time, timezone): Extract start/end timesis_tba(event, payload): Detect TBA/TBDget_sport_duration(sport_family): Default duration by sport-
Dependencies: None
-
EventEnrichmentBuilder (
backend/epgoat/services/enrichment_builder.py, ~180 lines) build_from_event(event, context, api_source): Build complete enrichment dictbuild_description(event, family): Rich description from event dataget_event_logos(event): Extract logo URLs- Dependencies:
TimeExtractor,SportTypeDetector
Cross-Cutting: Observers
Observer Base (observers/base.py, ~20 lines):
class EnrichmentObserver(ABC):
@abstractmethod
def notify(self, event: str, context: EnrichmentContext) -> None:
pass
Implementations:
- MatchDebugObserver (~60 lines): Structured diagnostics logging
- CostTrackingObserver (optional): API cost tracking
- FamilyStatsObserver (optional): Pattern learning
Data Flow
Input (channel_name, family, payload, parsed_time, target_date, timezone)
β
βΌ
[EnrichmentPipeline.enrich()]
β
ββ 1. Create EnrichmentContext
ββ 2. Notify observers ('started')
ββ 3. Prepare context (parse teams, infer league, detect sport)
β β
β ββ TeamParser.parse_teams()
β ββ TeamParser.canonicalize_teams()
β ββ LeagueInferencer.infer_league()
β ββ SportTypeDetector.detect_sport_type()
β
ββ 4. Run handler chain
β β
β ββ EnhancedMatchCacheHandler β [miss]
β ββ EventDetailsCacheHandler β [miss]
β ββ LocalDatabaseHandler β [miss]
β ββ RegexMatcherHandler β [miss]
β ββ CrossProviderCacheHandler β [miss]
β ββ APIHandler β [HIT!]
β β β
β β ββ EventEnrichmentBuilder.build_from_event()
β β
β ββ FallbackHandler (if all miss)
β
ββ 5. Post-process (update caches, learn patterns)
β β
β ββ CrossProviderCache.store_event()
β ββ EnhancedMatchCache.store_match()
β ββ FamilyStatsTracker.learn_match()
β ββ CostTracker.track_family_match()
β
ββ 6. Notify observers ('completed')
β
βΌ
Output (enrichment dict)
Component Specifications
EnrichmentContext (Data Transfer Object)
Location: enrichment/context.py
@dataclass
class EnrichmentContext:
"""Context object passed through enrichment pipeline."""
# INPUT (provided by caller)
channel_name: str
family: str
payload: str
parsed_time: Optional[datetime]
target_date: date
target_timezone: ZoneInfo
# PARSED DATA (populated by services)
normalized_channel_name: str = ""
normalized_payload: str = ""
team1: Optional[str] = None
team2: Optional[str] = None
team_ids: tuple[str, ...] = field(default_factory=tuple)
candidate_leagues: list[str] = field(default_factory=list)
inferred_league: Optional[str] = None
sport_type: Optional[str] = None
sport_emoji: Optional[str] = None
# RESULT (populated by handlers)
enrichment: Optional[dict[str, Any]] = None
matched: bool = False
match_source: Optional[str] = None
# METADATA (for debugging)
handler_attempts: list[str] = field(default_factory=list)
errors: list[str] = field(default_factory=list)
EnrichmentHandler (Base Class)
Location: enrichment/handlers/base.py
class EnrichmentHandler(ABC):
"""Base class for enrichment handlers (Chain of Responsibility)."""
def __init__(self, next_handler: Optional['EnrichmentHandler'] = None):
self._next_handler = next_handler
def handle(self, context: EnrichmentContext) -> EnrichmentContext:
"""
Attempt to enrich. If successful, return context with matched=True.
If unsuccessful, pass to next handler in chain.
"""
try:
context.handler_attempts.append(self.__class__.__name__)
result = self._try_enrich(context)
if result.matched:
result.match_source = self.__class__.__name__
return result # Stop chain
except Exception as e:
logger.warning(f"{self.__class__.__name__} failed: {e}")
context.errors.append(f"{self.__class__.__name__}: {e}")
# Continue chain
if self._next_handler:
return self._next_handler.handle(context)
return context
@abstractmethod
def _try_enrich(self, context: EnrichmentContext) -> EnrichmentContext:
"""Handler-specific enrichment logic."""
pass
Factory Function
Location: enrichment/factory.py
def create_enrichment_pipeline(
# API clients
thesportsdb_client: Optional[TheSportsDBClient] = None,
espn_client: Optional[ESPNAPIClient] = None,
# Caches
enhanced_cache: Optional[EnhancedMatchCache] = None,
event_details_cache: Optional[EventDetailsCache] = None,
event_database: Optional[EventDatabase] = None,
cross_provider_cache: Optional[CrossProviderEventCache] = None,
# Analytics
match_debug_logger: Optional[MatchDebugLogger] = None,
cost_tracker: Optional[CostTracker] = None,
family_stats_tracker: Optional[FamilyStatsTracker] = None,
# Other
team_alias_index: Optional[TeamAliasIndex] = None,
regex_matcher: Optional[MultiStageRegexMatcher] = None,
) -> EnrichmentPipeline:
"""
Factory function to wire up the entire enrichment pipeline.
This is the composition root - all dependency injection happens here.
"""
# Create services
team_parser = TeamParser(team_alias_index=team_alias_index)
league_inferencer = LeagueInferencer()
sport_detector = SportTypeDetector()
flo_mapper = FloSportsMapper()
enrichment_builder = EventEnrichmentBuilder(
time_extractor=TimeExtractor(),
sport_detector=sport_detector,
)
# Create handlers (conditionally based on what's provided)
handlers = []
if enhanced_cache:
handlers.append(EnhancedMatchCacheHandler(...))
if event_details_cache:
handlers.append(EventDetailsCacheHandler(...))
# ... etc
# Always add fallback
handlers.append(FallbackHandler())
# Create observers
observers = []
if match_debug_logger:
observers.append(MatchDebugObserver(match_debug_logger))
return EnrichmentPipeline(
team_parser=team_parser,
league_inferencer=league_inferencer,
sport_detector=sport_detector,
flo_mapper=flo_mapper,
handlers=handlers,
observers=observers,
cross_provider_cache=cross_provider_cache,
enhanced_match_cache=enhanced_cache,
family_stats_tracker=family_stats_tracker,
cost_tracker=cost_tracker,
)
File Organization
New Directory Structure
backend/epgoat/services/enrichment/
βββ __init__.py # Public API exports
βββ context.py # EnrichmentContext (50 lines)
βββ pipeline.py # EnrichmentPipeline (150 lines)
βββ factory.py # create_enrichment_pipeline (100 lines)
β
βββ handlers/
β βββ __init__.py # Handler exports
β βββ base.py # EnrichmentHandler ABC (50 lines)
β βββ cache_handlers.py # 3 cache handlers (200 lines)
β βββ database_handler.py # LocalDatabaseHandler (80 lines)
β βββ regex_handler.py # RegexMatcherHandler (70 lines)
β βββ api_handler.py # APIHandler (150 lines)
β βββ fallback_handler.py # FallbackHandler (30 lines)
β
βββ backend/epgoat/services/
β βββ __init__.py # Service exports
β βββ team_parser.py # TeamParser (200 lines)
β βββ league_inferencer.py # LeagueInferencer (250 lines)
β βββ sport_detector.py # SportTypeDetector (100 lines)
β βββ flo_mapper.py # FloSportsMapper (80 lines)
β βββ time_extractor.py # TimeExtractor (120 lines)
β βββ enrichment_builder.py # EventEnrichmentBuilder (180 lines)
β
βββ observers/
βββ __init__.py # Observer exports
βββ base.py # EnrichmentObserver ABC (20 lines)
βββ debug_observer.py # MatchDebugObserver (60 lines)
TOTAL: ~1,900 lines across 14 focused files (vs 2,068 lines in 1 file)
Public API
# backend/epgoat/services/enrichment/__init__.py
from .context import EnrichmentContext
from .pipeline import EnrichmentPipeline
from .factory import create_enrichment_pipeline
# Handler exports (for testing/advanced usage)
from .handlers import (
EnrichmentHandler,
EnhancedMatchCacheHandler,
EventDetailsCacheHandler,
LocalDatabaseHandler,
RegexMatcherHandler,
CrossProviderCacheHandler,
APIHandler,
FallbackHandler,
)
# Service exports (for reuse)
from .services import (
TeamParser,
LeagueInferencer,
SportTypeDetector,
FloSportsMapper,
TimeExtractor,
EventEnrichmentBuilder,
)
# Observer exports
from .observers import EnrichmentObserver, MatchDebugObserver
__all__ = [
"EnrichmentContext",
"EnrichmentPipeline",
"create_enrichment_pipeline",
# Handlers
"EnrichmentHandler",
"EnhancedMatchCacheHandler",
"EventDetailsCacheHandler",
"LocalDatabaseHandler",
"RegexMatcherHandler",
"CrossProviderCacheHandler",
"APIHandler",
"FallbackHandler",
# Services
"TeamParser",
"LeagueInferencer",
"SportTypeDetector",
"FloSportsMapper",
"TimeExtractor",
"EventEnrichmentBuilder",
# Observers
"EnrichmentObserver",
"MatchDebugObserver",
]
Migration Strategy
Phase 1: Create New Implementation (Parallel)
Timeline: Sprint 1-2
Create new enrichment/ package alongside existing backend/epgoat/services/api_enrichment.py:
- Week 1: Foundation
- Create
context.py(EnrichmentContext) - Create
handlers/base.py(EnrichmentHandler ABC) - Create
observers/base.py(EnrichmentObserver ABC) -
Write unit tests for base classes
-
Week 2: Services
- Implement 6 support services (team_parser, league_inferencer, etc.)
- Extract logic from existing
EPGEnrichermethods -
Write unit tests for each service (aim for 90%+ coverage)
-
Week 3: Handlers
- Implement 7 handlers
- Wire up dependencies
-
Write unit tests for each handler
-
Week 4: Pipeline & Factory
- Implement
EnrichmentPipeline - Implement
create_enrichment_pipelinefactory - Write integration tests
Validation: New implementation passes all unit tests, integration tests run successfully.
Phase 2: Update Callers
Timeline: Sprint 3
Find all callers of EPGEnricher and update to use new API:
# OLD way (api_enrichment.py):
enricher = EPGEnricher(
api_key=api_key,
enable_api=True,
use_espn_fallback=True,
failure_tracker=failure_tracker,
api_cache=api_cache,
event_database=event_database,
mismatch_tracker=mismatch_tracker,
event_details_cache=event_details_cache,
match_debug_logger=match_debug_logger,
team_alias_index=team_alias_index,
)
result = enricher.enrich_event(
channel_name=channel_name,
family=family,
payload=payload,
parsed_time=parsed_time,
target_date=target_date,
target_timezone=target_timezone,
)
# NEW way (enrichment/):
pipeline = create_enrichment_pipeline(
thesportsdb_client=TheSportsDBClient(api_key=api_key),
espn_client=ESPNAPIClient(),
enhanced_cache=EnhancedMatchCache(),
event_details_cache=event_details_cache,
event_database=event_database,
cross_provider_cache=CrossProviderEventCache(),
match_debug_logger=match_debug_logger,
cost_tracker=CostTracker(api_cost_per_call=0.004),
family_stats_tracker=FamilyStatsTracker(),
team_alias_index=team_alias_index,
regex_matcher=MultiStageRegexMatcher(),
)
result = pipeline.enrich(
channel_name=channel_name,
family=family,
payload=payload,
parsed_time=parsed_time,
target_date=target_date,
target_timezone=target_timezone,
)
Steps:
1. Find all EPGEnricher instantiations (grep, IDE search)
2. Update each caller to use factory function
3. Update imports
4. Run tests after each update
5. Commit after each file migrated
Expected Callers:
- backend/epgoat/application/epg_generator.py (main CLI)
- Integration tests
- Any other enrichment workflows
Phase 3: Remove Old Code
Timeline: Sprint 4
After all callers migrated and verified:
- Delete
backend/epgoat/services/api_enrichment.py - Update imports across codebase
- Update documentation references
- Run full test suite
- Create PR with summary of changes
Validation: All tests pass, no references to old API remain.
Testing Strategy
Unit Tests (Component Isolation)
Services (6 test files):
test_team_parser.py:
- test_parse_teams_with_vs_separator
- test_parse_teams_with_at_separator
- test_parse_teams_with_unicode_separators
- test_clean_team_name_removes_rank
- test_clean_team_name_removes_time_patterns
- test_canonicalize_teams_resolves_aliases
- test_canonicalize_teams_skips_college_sports
test_league_inferencer.py:
- test_infer_from_explicit_token
- test_infer_from_team_ids
- test_infer_from_database_both_teams
- test_infer_from_database_single_team
- test_infer_from_alias_pairing
- test_candidate_league_deduplication
- test_flosports_mapping
test_sport_detector.py:
- test_detect_from_title_prefix
- test_detect_from_keywords
- test_get_sport_emoji
- test_flosports_subcategory_detection
test_flo_mapper.py:
- test_extract_subcategory_with_colon
- test_extract_subcategory_without_colon
- test_map_to_league
- test_unmapped_subcategory_returns_none
test_time_extractor.py:
- test_parse_api_time_utc_to_target_timezone
- test_parse_api_time_fallback_to_parsed_time
- test_is_tba_no_time
- test_is_tba_status_indicators
- test_get_sport_duration_by_family
test_enrichment_builder.py:
- test_build_from_event
- test_build_description
- test_get_event_logos
- test_get_event_times
Handlers (3 test files):
test_cache_handlers.py:
- test_enhanced_cache_hit
- test_enhanced_cache_miss_continues_chain
- test_details_cache_finds_by_teams_and_date
- test_details_cache_miss_continues_chain
- test_cross_provider_cache_hit
- test_cross_provider_cache_miss
test_database_handler.py:
- test_local_db_hit
- test_local_db_miss_continues_chain
- test_local_db_uses_search_window
- test_local_db_uses_similarity_threshold
test_api_handler.py:
- test_thesportsdb_success
- test_espn_fallback_on_tsdb_failure
- test_tries_all_candidate_leagues
- test_stops_on_first_match
- test_updates_caches_on_match
Pipeline (1 test file):
test_enrichment_pipeline.py:
- test_enhanced_cache_hit_skips_later_handlers
- test_fallback_through_all_handlers
- test_api_match_updates_caches
- test_observers_notified_on_start_and_complete
- test_context_accumulates_parsed_data
- test_error_in_handler_continues_chain
- test_post_process_updates_all_caches
- test_family_stats_learning_on_match
Integration Tests
test_integration_enrichment.py:
- test_full_pipeline_with_all_handlers
- test_full_pipeline_api_match
- test_full_pipeline_cache_hit
- test_full_pipeline_regex_match
- test_full_pipeline_fallback
- test_cost_tracking_integration
- test_debug_logging_integration
Coverage Target
- Unit Tests: 85%+ coverage per module
- Integration Tests: 90%+ coverage of pipeline orchestration
- Overall Target: 85%+ coverage
Test Isolation
Each test file should: - Mock external dependencies (API clients, caches, databases) - Test one component in isolation - Run in <100ms per test - Be independent (no shared state)
Benefits Summary
Code Quality Improvements
| Metric | Before | After | Improvement |
|---|---|---|---|
| File Count | 1 monolith | 14 focused files | +13 files (better organization) |
| Total Lines | 2,068 | ~1,900 | -168 lines (8% reduction) |
| Max File Size | 2,068 lines | 300 lines | 7x reduction |
| Max Function Size | 800 lines | 50 lines | 16x reduction |
| Responsibilities per File | 10+ | 1 | 100% SRP compliance |
| Cyclomatic Complexity | >20 | <10 | >50% reduction |
| Nesting Depth | 6+ levels | 2-3 levels | 50% reduction |
SOLID Compliance
Before: - β Single Responsibility: 10+ responsibilities - β Open/Closed: Modification required for new strategies - β οΈ Liskov Substitution: N/A - β οΈ Interface Segregation: N/A - β Dependency Inversion: Dependencies hardcoded
After: - β Single Responsibility: 1 per file - β Open/Closed: Extend via new handlers - β Liskov Substitution: All handlers interchangeable - β Interface Segregation: Focused interfaces (Handler, Observer) - β Dependency Inversion: Factory injection
Design Patterns Applied
- β Chain of Responsibility: Handler chain
- β Observer: Analytics decoupling
- β Strategy: Interchangeable handlers
- β Factory: Dependency injection
- β Repository: Data access (existing)
- β Service Layer: Business logic (existing)
Maintainability Improvements
Before: - β Hard to add new enrichment strategies (800-line function) - β Hard to reorder priorities (deeply nested logic) - β Hard to test (tightly coupled) - β Hard to debug (complex state)
After: - β Easy to add new strategies (new handler class) - β Easy to reorder priorities (reorder handler list) - β Easy to test (isolated components) - β Easy to debug (clear context progression)
Performance Implications
No Performance Regression: - Same cache hierarchy (same hit rates) - Same API call patterns - Same regex matching - Negligible object creation overhead (<1ms)
Potential Improvements: - Better cache layer visibility (measure each layer) - Easier to add new optimization strategies - Clearer performance bottlenecks
Risks and Mitigations
Risk 1: Breaking Changes Impact Callers
Risk: Callers need to update to new API, could cause temporary breakage.
Mitigation: - Parallel implementation (no deletion of old code until migration complete) - Clear migration guide with examples - Update callers one at a time, test after each - Keep old code until all callers migrated
Risk 2: Increased Complexity (More Files)
Risk: 14 files vs 1 file could be harder to navigate.
Mitigation:
- Clear naming conventions
- Comprehensive README.md in enrichment/ package
- Public API exports in __init__.py
- Strong documentation per module
Risk 3: Testing Overhead
Risk: More components = more tests to write and maintain.
Mitigation: - Isolated testing is actually easier (mock dependencies) - Reuse test fixtures across similar tests - Focus on unit tests first (faster, easier) - Integration tests for end-to-end validation
Risk 4: Migration Bugs
Risk: Logic might be lost or changed during refactoring.
Mitigation: - Extract methods first, refactor second - Run existing tests after each extraction - Add new tests alongside extraction - Manual testing of key workflows
Success Criteria
Implementation Complete When:
- [ ] All 14 new files created and tested
- [ ] Unit test coverage >85% per module
- [ ] Integration tests pass
- [ ] All existing tests still pass
- [ ] Documentation complete (READMEs, docstrings)
- [ ] Code review approved
Migration Complete When:
- [ ] All callers updated to new API
- [ ] Old
backend/epgoat/services/api_enrichment.pydeleted - [ ] All tests pass
- [ ] No references to old API remain
- [ ] Performance benchmarks show no regression
Success Metrics:
- [ ] File size <300 lines per file
- [ ] Function size <50 lines per function
- [ ] 100% SOLID compliance
- [ ] Test coverage >85%
- [ ] No mypy errors
- [ ] No Ruff violations
- [ ] CI pipeline passes
References
Related Documentation
- Engineering Standards: Core Principles
- Engineering Standards: Python
- Engineering Standards: Architecture Patterns
Related Issues
- Phase 2 Code Review (2025-11-03)
- api_enrichment.py 7-point inspection findings
Implementation Plan
Once design is approved, use superpowers:writing-plans to create detailed implementation plan with:
- Exact file locations
- Complete code examples
- Step-by-step verification
- Zero-context engineer handoff
Design Status: β Approved - Ready for Implementation Planning